comis

An R Package to Read CCCCO MIS Files

Christian Million
Data Analyst

Yosemite Community College District

Goals

Main Goal

  • Showcase benefits of developing internal packages with R.

Along the way…

  • Inspire!

  • Use comis as a motivating example

  • Why is package development worthwhile?

  • Learn too much about MIS files.

library(comis)

# Read Referential File
CB <- read_ref("path/to/CB223.txt")

# Read Submission File
XB <- read_sub("path/to/U59217XB.DAT")

What is comis?

An internally developed R package

Purpose

Read and Format:

  • MIS Submission Files

  • MIS Referential Files

MIS Data: What and Why?

MIS 101 - Submission Files

Every term, someone at your college converts SIS data into .DAT files, using the file specs found in the Data Element Dictionary.

These are submission files.

MIS 101 - Referential Files

After submission, colleges request referential files from the CCCCO.

These contain elements derived from submission files, explicit formatting, and additional student information.

MIS is Important Data

We want to analyze it!

Uses

  • Monitoring Student Success and Equity

  • Accountability

  • Categorical Funding (EOPS, DSPS, Perkins, …)

  • Student Centered Funding Formula

  • Research

The Challenge

Submission Files - Challenges

  • ~ 25 files | 396 elements

  • Fixed Width Format

  • No Column Names

  • Numbers that should be characters / dates

  • Missing values (NA)

  • Trailing white space

  • Implied decimal points

Referential Files - Challenges

  • ~ 27 files | 406 elements

  • Tab Delimited :)

  • No Column Names

  • Numbers that should be characters / dates

  • Missing values (NA)

  • Trailing white space

  • Implied decimal points

  • Different date format than submission file.

Yikes

  • A lot to re-remember

  • Cognitively taxing to implement

  • Takes time

  • Updates to multiple scripts

  • Copy / paste errors

  • Makes scripts more difficult to read

  • Unfulfilling

  • Lots of overhead before analysis can begin

Before comis

library(dplyr)
library(readr)

CB_col_names <- c('GI90', 'GI01','GI03', paste0("CB0",0:9), paste0("CB",10:27), "Filler")
CB_col_types <- rep("c", length(CB_col_names))
CB_col_width <- CB <- c(2,3,3,12,12,68,6,1,1,length(109:112),length(113:116),1,1,1,1,1,1,6,8,length(137:148),length(149:160),length(161:172),7,9,1,1,1,1,1,1,1,26)

XB_col_names <- c('GI90', 'GI01', 'GI03', 'GI02', 'CB01', paste0('XB0',0:9), 'XB10', 'XB11', 'XB12', 'CB00', 'Filler')
XB_col_types <- rep("c", length(XB_col_names))
XB_col_width <- c(2,3,3,3,12,6,1,6,6,1,length(44:47), length(48:51),1,1,1,1,length(56:61), 1, 12,7)

CB_src <- readr::read_tsv("path/to/U59223CB.dat",
                           col_names = CB_col_names,
                           col_types = CB_col_types,
                           trim_ws = TRUE)
                           
XB_src <- readr::read_tsv("path/to/U59223XB.dat",
                           col_names = CB_col_names, # copy / paste errors
                           col_types = XB_col_types,
                           trim_ws = TRUE)

CB <- CB_src |>
    mutate(dates = date_cleaning_code(),
           units = implicit_decimal_code())
           
XB <- XB_src |>
    mutate(dates = date_cleaning_code(),
           units = implicit_decimal_code())

After comis

library(comis)

CB <- read_sub("path/to/U59223CB.dat")
XB <- read_sub("path/to/U59223XB.dat")

Additional Features

  • Contains useful data found on CCCCO websites

  • Read many files at once

  • Read from repo

  • Use DED Name or Descriptive Name

library(dplyr)
library(comis)

read_ref_repo("CB", c("217", "223")) |>
    left_join(top_codes, by = c("CB03" = "top_code")) |>
    left_join(colleges, by = c("GI01")) |>
    filter(vocational == "Y",
           institution == "COLUMBIA")
library(comis)

# Reads many files of same "domain" at once
read_sub(c("U59223CB.DAT", "U59217CB.DAT"))


read_ref(c("CB217.txt", "CB223.DAT"))
library(comis)

# Set in .Rprofile or .Renviron
options(comis.repo.referential = "path/to/ref/repo/")

read_ref_repo("CB", c("217", "223"))
library(comis)

# Column names are DED Codes.
# like "GI01", "CB00", "CB01"
read_ref_repo("CB", "217")

# Column names are words.
# like "COLLEGE_ID", "COURSE_ID", "CONTROL_NUMBER"
read_ref("CB", "217", desc = TRUE)

Benefits of comis

  • Easier to tell what’s happening

  • Reduces cognitive overhead

  • Get to analysis faster and with more confidence

  • Documentation contained within the package

  • Updates made in one spot (instead of throughout various scripts)

  • Shifts focus to what’s important - Using the Data

Why Develop Internal R Packages?

  • Addresses problems specific to the institution

  • Reasonable defaults

  • Abstracts common tasks

  • Maintainable

  • Share code with others

  • Documentation / Vignettes

Other “Internal” R Packages

Examples

  • DisImpact (“internal” to CCCCO)

  • yccdDB (creates and manages DB connections / queries)

  • hub (.Rmd/.Qmd storage and usage monitoring)

Ideas

  • Reading/Process other CCC Files: SEA, VFS, SCFF, etc…

  • yccdTemplates (project / analysis / report templates)

  • yccdThemes (branding graphs / reports)

  • yccdTerms (help with term math / formatting)

Thanks!

Contact


Christian Million

Data Analyst

Yosemite Community College District